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Abstract 

Berry Esseen type bounds to the normal, based on zero- and size-bias couplings, 
are derived using Stein's method. The zero biasing bounds are illustrated with an 
application to combinatorial central limit theorems where the random permutation 
has either the uniform distribution or one which is constant over permutations with 
the same cycle type and having no fixed points. The size biasing bounds are applied 
to the occurrences of fixed relatively ordered sub-sequences (such as rising sequences) 
in a random permutation, and to the occurrences of patterns, extreme values, and 
subgraphs on finite graphs. 

1 Introduction 

Berry Esseen type bounds for normal approximation are developed using Stein's method, 
based on zero and size bias couplings. The results are applied to bound the proximity to the 
normal in combinatorial central limit theorems where the random permutation has either a 
uniform distribution, or one which is constant over permutations with the same cycle type, 
with no fixed points; to counting the number of occurrences of fixed, relatively ordered sub- 
sequences, such as rising sequences, in a random permutation; and to counting on finite 
graphs the number of occurrences of patterns, local extremes, and subgraphs. 

Stein's method ([2Ej, [2H]) uses characterizing equations to obtain bounds on the error 
when approximating distributions by a given target. For the normal j2ZI, X ~ N'{fi,a'^) if 
and only if 

E{X-f^)f{X) = a'Ef'{X) (1) 

for all absolutely continuous / for which E|/'(X)| < oo. From such a characterizing equa- 
tion, a difference or differential equation can be set up to bound the difference between the 
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expectation of a test function h evaluated on a given variable Y, and on the variable X hav- 
ing the target distribution. For the normal, with X having the same mean fi and variance 
0"^ as Y, the characterizing equation leads to the differential equation 

h{{y - -Nh = a'fiy) - - (2) 

where Nh = Wjh{Z) with Z ~ A/'(0, 1), the standard normal mean of the test function h. At 
Y, the expectation of the left hand side can be evaluated by calculating the expectation of 
the right hand side using the bounded solution / of ((2)) for the given h. By this device, Stein's 
method can handle various kinds of dependence through the use of coupling constructions. 

We consider and compare two couplings of a given Y to achieve normal bounds. First, 
for Y with mean zero and variance e (0, cx)), we say that Y* has the F-zero biased 
distribution if 

EYf{Y) = a'^Ef'{Y*) (3) 

for all absolutely continuous functions / for which the expectation of either side exists. This 
'zero bias transformation' from Y to Y* was introduced in ^Sj, and it was shown there that 
Y* exists for every mean zero Y with finite variance. Similarly, for Y non-negative with 
finite mean EF = /x, we say that has the F-size biased distribution if 

EYf{Y)=fiEf{Y') (4) 

for all / for which the expectation of either side exists. The size biased distribution exists 
for any non-negative Y with finite mean, and was used for normal approximation in jl7j . 

A coupling (y, Y*) where Y* has the F-zero biased distribution lends itself for use in the 
Stein equation (jSj) in the following way; by (jHI), with = 1 say, we have 

EhiY) -Nh = E [f'iY] - YfiY)] = E [f'{Y) - /'(F*)] . (5) 

Therefore, the difference between Y and the normal, as tested on h, equals the difference 
between Y and Y*, as tested on /'. Additionally, as observed in [THj and seen directly from 
(jni), F is normal if and only if Y =d Y* . It is therefore natural that the distance from Y 
to the normal can be expressed in terms of distance from Y to Y*. Theorem 11.11 makes 
this statement precise, showing that the distance from the standardized Y to the normal as 
measured by 5 in © depends on the distribution of Y only through a bound on \Y* —Y\. 
A similar phenomenon is seen in ^3] with (iw the Wasserstein distance, where it is shown 
that, for any mean zero variance o"^ variable F, and X ~ A/'(0, a^). 



dw{Y,X) < 2dY^{Y,Y*). 

The use of size bias couplings in the Stein equation in (jU7|) . (jUH|) and subsequent calcu- 
lations depends on the following identity, which is applied in a less direct manner than (0); 
for F > with mean ^ and variance cr^, 

E(F-/i)/(F) =/iE (/(F^) -/(F)) and therefore = /xE (F^ - F). 

With W = {Y-iJ,)/a, many authors (e.g. [ZI, [12], 0, El, ESI, HOI) have been successful 
in obtaining bounds on the distance 

6 = snp\E h{W) - Nh\ (6) 

hen 
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to the normal, over classes of non-smooth functions 7i, using Stein's method. Here we take 
the smoothing inequality approach, following [23]. In particular, 7i is a class of measurable 
functions on the real line such that 

(i) The functions h ETi are uniformly bounded in absolute value by a constant, which we 
take to be 1 without loss of generality, 

(ii) For any real numbers c and d, and for any h{x) G 7Y, the function h{cx + d) E Ti, 

(iii) For any e > and h eH, the functions hf, are also in H, and 

Eh,{Z) < ae (7) 
for some constant a which depends only on the class Ti, where 
hf (x) = sup h{x + y) , h~{x) = mih{x + y), and he{x) = (x) — (x) . (8) 

\y\<e \y\<^ 

The collection of indicators of all half lines, and indicators of all intervals, for example, 
each form classes Ti, which satisfy (jHI) and ((Tj) with a = \/2/tt and a = l^ljix respectively 
(see e.g. jZHl)- 

Since the bound on 6 in Theorem ll . ll depends only the size of F |, it may be computed 
without the need for the calculation of the variances of certain conditional expectations that 
arise in other versions of Stein's method, A/Var{E[(y' — y)^|y]} for the exchangeable pair 
method, or the term (fT^ for the size bias coupling studied here. 

Theorem 1.1 Let Y he a mean zero random variable with variance cr^ G (0, oo), and Y* be 
defined on the same space having the Y-zero biased distribution. If \Y* — Y\ < 2B for some 
B < cr/24, then for 5 as in and a as in 

(5 < A (37 + 12A + 112a) , (9) 

for A = 2B/a. For indicators of all half lines, and the indicators of all intervals, using 
a = ■\/2/7r and a = 2\f2fK, we have respectively 

5 < yl(127+ 12A) and 5 < A (216 + 12A) . (10) 

See fl75|) and ()76p for some variations on the bound Q here, and ()12|) below, respectively. We 
note that Theorem 11.11 immediatelv provides a bound on 8 of order whenever |y* — Y\ 
is bounded. In Section |21 we apply Theorem 11.11 to random variables of the form 

n 

y = ^ai,^(i), (11) 

depending on a fixed array of real numbers {a^Yl^^^ and a random permutation vr G iS„, the 
symmetric group. In Section ITT] we consider vr having the uniform distribution on iS„, and 
in Section 12.21 distributions constant on cycle type having no fixed points (conditions 
and (P7|) respectively). 

For a size bias coupling (F, y*). Theorem 11.21 gives a bound on b which depends on the 
size of |y — y|, and additionally on A in (jl3|) . While A may be difficult to calculate 
precisely in many cases, size bias couplings can be more easily constructed for a broader 
range of examples than the zero bias couplings. 
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Theorem 1.2 Let Y > be a random variable with mean fi and variance cx^ G (0, oo), and 
let y be defined on the same space, with the Y-size biased distribution. If {Y'"^ — Y\ < B for 
some B < a^/^/i/H/i, then for 5 as in ^ and a as in 

6<^ + ^ ((19 + 56a)A' + iA') + (12) 

where 



A = ^/Var(E{Y' -Y\Y)) (13) 

and A = B/a. For indicator functions of all half lines and the indicators functions of all 
intervals, by using a = and a = 2^/2/7?, we respectively find that 

6<0AA + ^(64A' + 4A') + ^^ and S < 0.8A + ^ (mA' + 4A') + 

If the mean /i is of order o"^, B is bounded and A = a~^, then 5 will have order 0{a~^). 
The application of Theorem 11.21 to counting the occurrences of fixed relatively ordered sub- 
sequences, such as rising sequences, in a random permutation, and to counting the occur- 
rences of color patterns, local maxima, and sub-graphs in finite graphs is illustrated in Section 
El The proofs of Theorems 11.21 and 11.11 are given in Section |3J 

Nothing should be inferred from the fact that the zero bias applications presented here 
involve global dependence, and that the dependence in the examples used to illustrate the size 
bias approach is local; the exchangeable pair coupling on which our zero biased constructions 
are based can also be applied in cases of local dependence, and the size bias approach was 
applied in [T7j to variables having global dependence. 

In both zero and size biasing, a sum Y = Xlaeyt of independent variables on a finite 
index set A is biased by choosing a summand at random and replacing it with its biased 
version. To describe the zero biasing coupling, let {Xa}a£A be a collection of mean zero 
variables with finite variance, and I an independent random index with distribution 

Pil = «) = (14) 

where Wa = Var(XQ,). It was shown in ^3] that replacing Xj by a variable Xj having the 
Xj-zero bias distribution, independent of {Xa,a ^ /}, gives 

y* = r-X/ + x;, (15) 

a variable having the F-zero biased distribution. Hence, when a sum of many independent 
variables of the same order is coupled this way to its zero biased version, the magnitude of 
(Y* —Y)/ A/Var(y), and therefore of distance measures such as 5, are small. 

The construction of the size biased coupling in the independent case is similar. Let 
{Xa\a&A be a collection of non-negative variables with finite mean. Then, with I a random 
index independent of all others variables, having distribution ()14|) with Wa = EX^, the 
replacement of Xj by a variable Xj with the Xj-size bias distribution, independent of the 
remaining variables, gives a variable with the F-size biased distribution. 

Zero biased couplings of Y* to a sum Y of non-independent variables Xi,...,X„ is 
presently not very well understood. A construction in the presence of the weak global 
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dependence of simple random sampling was given in . Based on a remark in ^3] , we here 
exploit a connection between the zero bias coupling and the exchangeable pair {Y', Y") of [2H] 
with distribution dP{y', y") satisfying E(F"|y) = (1 — A)y for some A G (0, 1]; in particular, 
we make use of a pair with distribution proportional to {y' — y")'^dP{y' ,y"). 

The construction of Y and on a common space for the sum of non-independent 
variables Xi, . . . is more direct, and was described in Lemma 2.1 of ^7]; we choose a 
summand with probability proportional to its expectation, replace it by one from its size- 
biased distribution, and then adjust the remaining variables according to the conditional 
distribution given the value of the newly chosen variable. This construction is applied in 
Section ini and a 'squared' zero biasing form of it in Section |21 

The mappings of a distribution Y to its zero biased Y* or size biased versions are 
special cases of distributional transformations from Y to some which are specified by 
a function H and characterizing equation 



where f^"^^ denotes the m*'* derivative of /, and t] is, necessarily, (m\)~^^ H {Y)Y"^ when 
this expectation exists. The zero bias and size bias transformation correspond to m = 1 and 
H{x) = X, and m = and H{x) = x~^, respectively. In general, such a y^"*) exists when H 
and Y satisfy certain sign change and orthogonality properties, as discussed in |16j . 

2 Zero Biasing: Combinatorial Central Limit Theo- 
rems 

In this section, we illustrate the use of Theorem 11.11 to obtain Berry Esseen bounds in 
combinatorial central limit theorems, that is, for variables Y as in (jllj] . in Section f2.ll we 
do so for permutations having the uniform distribution over the symmetric group and, in 
Section 12121 we do so for permutations with distribution constant on those having the same 
cycle type, with no fixed points. First we present Proposition 12. H which suggests a method 
for the construction of zero bias couplings based on the existence of exchangeable pairs; its 
statement appears in [TH] . 

Proposition 2.1 Let Y' and Y" be an exchangeable pair, with distribution dP{y',y") and 
Var{Y') = 0"^ G (0,oo), which satisfies the linearity condition 



EH{Y)f{Y) = r7E/('^)(F("')) for all smooth /, 



E{Y"\Y') = (1 - \)Y' for some X G (0, 1]. 



(16) 



Then 



E y = and E (F' - Y"Y = 2Aa^ 
and if Y'^ and Y^^ have distribution 



(17) 




(18) 



and U ~ W[0, 1] is independent ofY'^ and Y'^, then the variable 

Y* = UY^ + (1 - U)Y^ has the Y' zero biased distribution. 



(19) 
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Proof: The claims in fll7p follow from (fT^ and exchangeability. Hence we need only show 
that Y* in ()19|) satisfies 0- For a differentiable test function /, 



a 



(r-n(/(n-/(n) 



E(F'-F 



li\2 



Now if we use ^ to obtain EF'7(r') = (1 - \)'EY'f{Y'), followed by expanding 
yields 

( y'f{y') - y"f{y') - y'fjy") + y"f{y") \ 2Aa^Er/(r) 

Example 2.2 Given a mean zero finite variance Y', let Y" be an independent copy of Y'. 
The pair {Y',Y") satisfies the conditions of Proposition 12.11 with A = 1, and hence, Y* as 
in (fT^ has the Y' zero bias distribution with {Y^Y'') as in (fTHj) . However, by coupling Y' 
close to Y", so that Y"^ is close to Y^, causes B, and therefore, the bound 6 of Theorem ll.il 
to be small. 

Remark 2.3 The following construction of {Y'^ ,Y^) suggested by Proposition \2. 1\ is similar 
to the one used for size biasing (see Lemma 2.1 of lHj and Section\^. Given Y' , first 
construct an exchangeable Y" close to Y' satisfying / f7^) . and then, independently construct 
the variables appearing in the 'square biased' term {Y' — Y")"^. Lastly, adjust the remaining 
variables that make up {Y', Y") to have their original conditional distribution, given the 
newly generated variables. 

Example 2.4 Let {X^', X"}.j=i^...^„ be i.i.d. mean zero variables with finite variancess, let 
Y' = Xir=i -^iy ^ independent random index with uniform distribution over 

{1, . . . , n}. Letting Y" = Y' — X'j + , the pair (F', Y") is exchangeable and satisfies the 
conditions of Proposition O with A = 1/n. Set S = j^i^i ^1 and (T',T") = {X'j,X'/). 
Applying Example 12. 21 to (T',T"), and forming (T^,T^) independently of {Xj',Xf}j^/, gives 
f/Tt + (1 - U)T^ = Xf By their independence from X'j, X'j', {XI, Xf already have their 
original conditional distribution, given (T^,T^); hence Y* = S + Xj, in agreement with (|15p. 

Applying this construction in the presence of dependence results in S, a function of the 
variables which can be kept fixed, and variables T', T^, T\ on a joint space, such that 

Y' = S + T', Y^ = S + T\ and Y^ = S + TK (20) 

When T', T''" and are all bounded by B, (|19|) gives 

\Y*-Y'\ = \UT^ + {1-U)T^ -T'\ <U\T^ + {1-U)\T^ + \T'\ <2B. (21) 

Let an array of real numbers satisfy 



n 



Eaij = for all i, and set C = max |ai, |. (22) 



■ 1 ^'-^ 
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By replacing Y in by Y — WiY we assume, without loss of generality, that Eaj^7r(i) = for 
every i. In Theorem 12.51 below, where vr is uniformly distributed over Sn, this assumption 
is equivalent to (j22|l . In Theorem 12. 6[ since vr has no fixed points, by ()27|1 . without loss of 
generality we have an = for all i in (jSEI)- In addition, since the distribution of vr is constant 
on permutations having the same cycle type, by (j2SI), Eaj^^(j) = (l/(n — 1)) X^jVi'^u; 
the mean zero assumption is again equivalent to ()22|) . Avoiding trivial cases, we also assume 
that Var(F) = cr^ > 0. For ease of notation we write Y' and vr' interchangeably for Y and 
vr, respectively, in the remainder of this section . 

In Sections 12.11 and 12.21 the construction above produces variables Y',Y'^ and Y'^, given 
by (fTT|) (with vr replaced by vr', vr''" and n^, respectively), and a set of indices T outside of 
which these permutations agree, such that (j^UI) holds with 

S = ^a.i,7r'(i), T' = ^aj,^'(j), T'' = ^aj_^t(i), and = ^aj^^t(j). (23) 

it^x iex iei iei 

Therefore B in (PT|) can be set equal to C in ()22j) times a worst case bound on the size 
of X. The specifications of vr', vr", vr''', and are given in terms of transpositions Tij, those 
permutations satisfying Tij{i) = j, Tij{j) = i and Tij{k) = k for all k ^ {hj}- 



2.1 Uniform permutation distribution 

Many authors (e.g. ^U] 110]) have considered normal approximation to the distribution 
of (fTT|) when vr is a permutation chosen uniformly from Sn- In Theorem 12. 5[ the dependence 
of 5 on C is not as refined as the bound in [7], which depends on an (unspecified) universal 
constant times the normalized absolute third moments of the {ajj}"^^]^. Here, on the other 
hand, an explicit constant is provided. 

Theorem 2.5 With n > 3, let {aij}i'j=i satisfy h2^) and let n be a random permutation 
with uniform distribution over Sn- Then, with C as in \2'<3^) . conclusions (0) and M(^) of 
Theorem \l-l\ hold for the sum Y = Yl^=i (^i.TT{i) with A = 8C/a when A < 1/12. 

Proof: Given vr', take (/, J) to be independent of vr', uniformly over all pairs with 1 < / 7^ 
J < n, and set vr" = vr'r/_j. In particular, vr"(i) = vr'(z) for i ^ {/, J}; the variables Y' and 
Y", given by ()lip with vr' and vr" respectively, are exchangeable; and 

y - y" = {aiyii) + «J,7r'(j)) - (a/,7r'(j) + aj,n'(i))- (24) 
The linearity condition (fTBj) is satisfied with A = 2/(n — 1), since, from and 

/ n 

EiY'-Y'Y) = 2 -Va,,.,(,)- — 

V n ^-^ n{n 

\ i=l ^ 




-Y'. 



i=i \ ' ■ ' 



To construct {Y'^ , Y^) with distribution proportional to {y'—y"ydP{y', y"), choose /"'', K'^, -P ^ 
with distribution proportional to the squared difference {Y' — F")^, that is. 



P{P = i,K^ = k, = J, = /) ~ [{aik + aji) - {an + a^k)] 



2 
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and let 



7rr^-x(^t),jt ifLt = 7r(/t),irt^7r(jt) 

vr^ = <! 7rr^-i(Lt),/t if ^ 7r(/t), K = 7r( Jt) 

7rT-^-i(i^t),/tr^-i(Lt),jt otherwise, 

and TT^ = 7rVjt,jt. Then ((2111) and (I2S1) hold with J = {/t, t(-^{K^), J\ 'k'^{L'^)], a set of size 
at most 4, so by \Y* -Y\< 8C. ■ 



2.2 Permutations with distribution constant over cycle type 

In this section we focus on the normal approximation of Y as in (|TT|) when the distribution 
of the random permutation vr is a function only of its cycle type. Our framework includes 
the case considered in [221, uniform distribution over permutations with a single cycle. 

Consider a permutation vr G iS„ represented in cycle form; in Sj for example, vr = 
((1, 3, 7, 5), (2, 6, 4)) is the permutation consisting of one 4 cycle in which 1— i>3— s>7— s>5— 
and one 3 cycle where 2 — 6 — > 4 — > 2. For g = 1, . . . , n, let Cq(7r) be the number of q cycles 
of vr. We say permutations n and a are of the same cycle type if Cg{n) = Cg{a) for all 
g = 1, . . . , n; TT and cr are of the same cycle type if and only if vr and cr are conjugate, i.e. if 
and only if there exists a permutation p such that tt = p'^ap. Hence, we say a probability 
measure P on Sn is constant over cycle type if 

P(7r) = P(p"Vp) for all tt, p G Sn- (25) 

In PH], the authors consider a statistical test for determining when a given pairing of 
n = 2m observations shows an unusually high level of similarity; the test statistic is of the 
form (|TT|l . and, under the null hypothesis of no distinguished pairing, the distribution P 
satisfies ()25|) with P(vr) equal to a constant if vr has m 2-cycles, and P(7r) = otherwise; 
that is, under the null, P is uniform over permutations having m 2-cycles. Bounds between 
the normal and the null distribution of Y were determined in ^H] using a construction in 
which an exchangeable vr" is obtained from vr by a transformation which preserves the m 
2-cycle structure. The construction in Theorem 12.61 preserves the cycle structure in general 
and, when there are m 2-cycles, specializes to one similar, but not equivalent, to that of |18j . 

Theorem 2.6 With n>A, let an array of real numbers satisfy i2^) . let 

Qij = Qji and an = 0, (26) 

and let it E Sn be a random permutation with distribution P constant on cycle type, with no 
fixed points. That is, P satisfies ^EW> 113^; 

P(7r) = z/ci(7r)^0. (27) 

Then, with C as in conclusions (0) and ^^) of Theorem li.il hold for the sum Y = 

Er=i (^iMi) ^^^^ ^ = 40C/(T when A < 1/12. 

Proof: To fully highlight the reason for the imposition of the conditions (|^?)|) and ()27|1 . 
and also to make the complete case analysis easier to follow, we initially consider an array 
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satisfying only the consequence J2i<i j<n^ij = of (j^ . and a P not necessarily satisfying 
(EH). 

Again, using the construction outlined in Remark 12. 3| we first construct vr" from the 
given vr'. Let I and J,l<I^J<nhe chosen uniformly and independently of vr', and let 
tt" = tjjtt'tjj; that is, vr" is obtained by interchanging I and J in the cycle representation of 
vr'. We claim the pair vr', vr" is exchangeable. For fixed permutations a", a', if a' ^ Tjja'Wjj 
then 



Pfvr" = a", 7i' = a') = = Pfvr' = a", vr" = a" 



Otherwise, a' = Tija"Tij and, using (pKjl for the second equality, we have 

P(7r" = a", vr' = a') = P(vr' = a') = P(vr' = njaV/j) = P(vr" = a') = P(vr' = a", vr" 



a') 



Consequently, Y and Y", given by (fTT|) with permutations vr and vr", respectively, are ex- 
changeable. By conditioning on vr, we show Y', Y" satisfies the linearity condition with 
A = 4/n. 

Let S be the size of the set {/, J, vr(/), vr( J)}, and, for i G {1, . . . ,n} let l^l denote the 
number of elements in the cycle of vr that contains i. Since I ^ J, we have 2 < S* < 4. When 
S = 2, either vr(/) = / and vr(J) = J, or vr(/) = J and vr(J) = /; in the both cases vr" = vr. 
There are four cases for S = 3; either Aj^j = {|/| = 1, | J| > 2} or / and J are interchanged 
(denoted by Ajj); or /, J and vr(J) are three consecutive distinct values of vr, indicated by 
Bj j, or I and J are interchanged (denoted by -Bj,/). The case 5* = 4 is indicated by F. 
Hence, 

Y' — Y" = {ajj + aT,-i(^j)j + aj^T,(^j) — {aj^j + aT,-i(^j)j + aj^.„(^j))) Aj^j (28) 

+ (o-J.J + «7r-i(/),/ + 0'I,7r{I) — (0.1,1 + 0'n-^{I),J + 0'J,tt{I))) Ajj 

+ (a-K-^ii),! + aij + aj^T,(^j) - (a,r-i(/),j + + ai^T^f^j))) Bij 

+ {<^TT-^{.J),.J + 0,J,I + «/,7r(/) — ('^7r-l(J),/ + Ct/.J + «'J,7r(/))) Bjj 

+ («7r-i(/),/ + 0-7,^(7) + a^-i(j),j + aj,7r(j) " {.0.^,-1 (^i)^j + aj^^(/) + a7r-i(j),/ + aj^T,(^j))) F. 

For example, using the fact that the sum of 0,^-1(7) j is the same as that of aj,r(j) over a 
given cycle, the contribution to n{n — 1)'E{Y' — i^"|vr) from Ajj = {|J| = 1, | J| > 2}, added 
to the equal one from Ajj, simplifies to 

2(r2 - 3ci(vr)) ^ ai_i + 4ci(vr) ^ ai,^(i) - 2ci(vr) ^ ai,i - 2 ^ - 2 ^ «j,i-(29) 

|i|=l |j|>l |j|>2 N=l,lil>2 N>2,|j|=l 

Next, the equal contributions from Bi j = l(vr(/) = J, |/| > 3) and Bjj sum to 

^ ^ Ctj,7r(j) — 4 ^ a^-l(j),7r(i) — 2 ^ a7r(j),i. (30) 
|j|>3 |i|>3 |i|>3 

On F = 1(|/| > 2, I J| > 2, / 7^ J, vr(/) 7^ J, vr(J) 7^ /), the contribution from a7r-i(/),/ is 

^ a^-i(i),il(« 7^ j, vr(i) ^ j, vr(j) ^ i). (31) 

Kl,lil>2 
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Let i = j denote the fact that i and j are elements of the same cycle. When i = j and 
{i,j, vr(i), 7r(j)} are distinct, we have \i\ > 4 and \i\ — 3 possible choices for j = i that satisfy 
the conditions in the indicator in (j3T| . Hence, the case i = j contributes 

|i|>4 j=i |i|>4 |i|>3 

When i ^ j the conditions in the indicator function are satisfied if and only if |i| > 
2, Ij'I > 2. For |i| > 2 there are n — \i\ — Ci(7r) choices for j, so the case i j contributes 

Y "'^-Hi),i Y " 1^1 ~ ci(7r))ai,^(i). 

Kl>2 J^«,|j|>2 |i|>2 

The next three terms on F give the same as the first, so in total we have 

4(n - 2 - ci(7r)) ^ ai,^(i) + 4(?2 - 3 - ci(7r)) ^ ai,^(i). (32) 

1*1=2 |i|>3 

Decomposing the contribution from the fifth term, according to whether i = j or i j, 
gives 

|i|>4 j=i l*Mil>2 i^i 

|i|>4 j^i |i|>4 K|,|i|>2 i^i 

= — ^ ^ Ojj + ^ («i,7r(i) + ^7r-l(i),7r(j) + ~ (^3) 

|j|>4 i^j |j|>4 |i|,lil>2 

To simplify (|33|). let a A 6 = min(a, b) and consider the decomposition 

n 

= 5Z 5Z + 5Z 5Z ^^'^ + 5Z 5Z + 5Z 5Z ^i'r (34) 

*J=1 |j|>4 i^i |j|<3 j^i \i\,\j\>2 m |i|A|i|=l i^i 

Since 'Yliij (^ij = O5 may replace the sum of the first and last terms in ()33p by the sum of 
the second and fourth terms on the right hand side of (j34p . respectively, resulting in 

+ ^ ^ aij + ^ (aj_^(j) + a^-i{i)^Tr(i) + ai,i) 

|j|<3 i=« l«|A|j|=l j^i |i|>4 

= ^ ^ ^jj + ^ ^ aij + ^ (aj,,r(j) + a7r-l(i),7r(j) + Ctj.j) , 
|i|<2 j^i |j|A|i|=l m \i\>3 

where we have used the fact that vr^(i) = 71^^(1) when \i\ = 3. Similarly shifting the \i\ = 2 
term we obtain 

|j|=l |i|A|i|=l |i|>2 |i|>3 

= ^ ^ Ojj + ^ «i,7r(i) + ^ + ^ «7r-l(j),7r(i)- 

|j|A|i|=l j^i \i\>2 \i\>l \i\>3 
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Combining this with the next three terms of F, each of which yields the same contribution, 
gives 

4^ai,^(i) +4^a^-i(,)_^(i) + 4^ai,i + 4 ^ ^a-ij- (35) 

|i|>2 |j|>3 |i|>l NA|i|=l i^i 

Combining (jHSj) with the contribution (jH^ of the first four terms in F, the Aj j and Ajj 
terms in ()29p and the Bj j and Bjj terms (jHUj) . yields n{n — l)'E{Y' — Y"\tt'); after cancelling 
the terms involving a7r-i(i),7r{j) in PHj) and (jHKjl and grouping like terms, we obtain 

4(n - 1) ^ ai,7r(i) + (4n - 2) ^ ai,,r(i) - 2 ^ a^(j)_i (36) 

1*1=2 |«|>3 |j|>3 

+ 2(n-ci(7r) + 2)^a,,,-2(ci(7r)-2)^a,,i (37) 

|i|=l |i|>2 
|i|Ab1=lj^i Kl=l,lil>2 |i|>2,|i|=l 

The assumption that Oj^j = causes the contribution from (jHTj) to vanish, the assumption 
that there are no 1-cycles causes the contribution from ()38|) to vanish, and the assumption 
that aij is symmetric causes the combination of the second and third terms in ()36|1 to yield 
E{Y'-Y"\Tr') = (4/n) ELi = (4/n)F'. Hence, the linearity condition ^ is satisfied. 

Since vr" = tjjtttjj, the terms that multiply the indicator functions in the difference 
Y' — Y" in depend only on values in a subset of {7r^^(/), /, 7r(/), 7r~^( J), J, 7r( J)} 
determined by the event indicated; for example, on Bj j the difference only depends on 
{7r^^(/), /, J, 7r( J)}. For each event we tabulate such values in a vector i. Likewise, with 
tt'^ and vr^ constructed according to tt^ = T/t jtvrV/tjt, the difference Y'^ — Y^ depends only 
on a subset of {P'^ , F , K"^ ,Q'^ , J\ L^, the corresponding values in the vr^ cycle, which we 
will tabulate in a vector Since Y' — Y" in (j^Hj) is a sum of terms multiplied by indicator 
functions of disjoint events, {Y' — Y"Y is a sum of those terms squared, multiplied by the 
same indicator functions. Hence to generate (vr''", vr-f) such that {Y^Y^^) has a distribution 
proportional to {y' — y"Y dF {y' , y"), on each event we generate the elements of with square 
weighted probability appropriate to the set indicated. Once the values in are chosen, in 
order for tt^ to have the conditional distribution of vr given these values, the remaining values 
of vr^ are obtained by interchanging i with in the cycle structure of vr. That is, in each 
case we specify vr^ in terms of tt by 

K 

7rt = Ti^itTTTi^it where T-i,it = JJ '^^^,4' 

k=l 

and i = (ii, . . . , ««;) and = (4, . . • , il) are vectors of disjoint indices, of some length k. 

For p E Sk and 1 = {h, . . . ,1^) any K-dimensional vector of indices, let p(l) = {p{lk) '■ k = 
!,...,«}, and let l be the identity permutation. Since the values of TjjtTTTjjt may differ from 
those of n only at i, i^, 7i~^{i) and 7r^^(i^), (pUj) will hold for the variables given by (|^. with 

The construction in each case proceeds as follows. Since 1-cycles are excluded, Aj^j and 
Ajj are null. On Bj^j, where J, J and vr( J) are three distinct, consecutive values of vr. 
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if |/| = 3 then the symmetry of gives Y" = Y', an event on which the distribution of 
(r^ F^), proportional to {Y"-Y'y, puts mass zero. Otherwise, \I\ > 4 and Y' — Y" depends 
only on i = (7r~^(/), I, J, vr(J)), and we choose i''" = {P\ l\ j\ L^), the corresponding values 
for vr''^, according to the distribution 

l\ J\ L^) ~ [(ap,i + ttj^i) — {ttpj + ai^i)]'^l{p,i,j, and I are distinct), 

noting that aij cancels with aj^i by symmetry. Now set vr^ as specified in ()39|) . In this case 
I has size at most thirteen. Reversing the roles of / and J gives the construction on Bjj. 
Next consider F, where I^tt{I)^ J and 7r(J) are distinct. If |/| = | J| = 2 then take 

{l\ K\ J\ L^) ~ [(ai,fc + — (o-ij + k,j and / are distinct), 

and set vr^ as specified in with i = (/, 7r(/), J, 7r( J)) and = {l\ K\ Jt, L^), and with 
the size of 1 at most twelve. For |/| > 3 and \ J\ =2, take 

{P\ I\ K\ J\ L^) ~ [{ap,i + ai,fc + 2ajy) - (apj + aj^k + 2ai,/)]^l(p, z. A;, j, and / are distinct), 

and set tt^ as specified in dSHl), with i = (7r-i(/), /, 7r(/), J, 7r( J)) and = {P\ l\ K\ J\ L^), 
and with the size of X at most sixteen. Reversing the roles of / and J gives the case in which 
I J| = 2 but |/| > 3. For \I\ > 3, | J| > 3, take 

{P\ l\ K\ Q\ J\ L^) ~ [{ap,i + tti^k + CLgj + aj^i) - {apj + aj^k + aq,i + a,i,i)f 

X l{p,i,k,q,j, and / distinct), 

and set n'^ as specified in fj39|l . with i = (7r^-'^(/), /, 7r(/), 7r^^( J), J, 7r( J)) and 

i"!" = {P\ l\ K"^ jQ"^ , J\ L"^). In this case, the size of I is at most twenty and, by ()2H1 . 

\Y* -Y\< 40C in all cases. ■ 



3 Size Biasing: Permutations and Patterns 

In this section we derive corollaries of Theorem 11.21 to obtain Berry Esseen bounds for the 
number of occurrences of fixed, relatively ordered sub-sequences, such as rising sequences, in 
a random permutation, and of color patterns, local maxima, and sub-graphs in finite graphs. 

Following ^7j, given a finite collection X = {Xa,a G A} of non- negative random vari- 
ables with index set A, for a G ^ we say the collection X" = {X^,f3 G A} has the 
X-size-biased distribution in direction a if 

EX,/(X) = EX„E/(X°) (40) 

for all functions / on X for which these expectations exist. For the given X, the collection 
X" exists for any a & A and has distribution dP°'{x.) = XadP{x.)/'EXa, where dP{x.) is the 
distribution of X. Specializing pUj) to the coordinate function /(X) = g{Xa), we see that 
X^ has the X^-size-biased distribution X^, defined in (jl)). 

Corollary 3.1 Let X = {X^, ol G A\ he a finite collection of random variables with values 
in [0, M] and let Y = '^^ga-^'^- Assume, for each a & A, there exists a dependency 
neighborhood Ba <Z A such that 

Xa and {X/3 : j3 ^ Ba} are independent. (41) 
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Furthermore, let pa = EXq,/ ^^^^EX^j and ma.Xa \ Ba\ = b. For each a E A, let (X, X") 
be a coupling of X to an X" with the ^-size-biased distribution in direction a, and let 
V G A X A and T D cr{^} be such that if {ai, 0:2) ^ then 

Cov{E {X^^ - Xp, 1^), E - Xp, 1^)) = for all (A, /^s) G B^, x . (42) 

Then Theorem mau be applied with 

B = bM and A<M ^ p„,p«Ji3c,J liS^a | < (maxp«)6Mv^. (43) 
Proof: Assuming, without loss of generahty, that EX^ > for each a G A, the factorization 

P"(X e dx) = p(x e dx I x„ = x^) 

shows that we can construct X" by first choosing X^ from the X^-size-bias distribution, 
and then choosing the remaining variables from the conditional distribution of X, given 
the chosen value of X". Note that X^ G [0, M] for all and, by ()41|) . that we may 
take X^ = X^ for (3 ^ Ba- By Lemma 2.1 of [T7|, = J2i3(^a^^ F-size-biased 
distribution, where the random index / has distribution P(/ = a) = pa, and is independent 
of both (X, X") and Hence 

Y' -Y =J2 (^,0 - and therefore, \Y'-Y\< bM. (44) 

Since a{Y} C = Var(E (F^ - Y\Y)) < Var(E (F^ - Y\J^)). Taking conditional 

expectation with respect to in (jl^ yields, 

E (F^ -Y\J^) = Y,Pc.Y. ^(^^ - ^/^l^) 

and, therefore, 

Var(E (F^ - r|^)) = E J] p„,p,,Cov(E(X^^^ - X^J^), E(X;^=^ - X^,|^)). 

(/3i,/32)eH<,iXBc«2 

Using (jl2I), we may replace the sum over (ai,a2) G ^ x ^ by the sum over (ai,a2) ^ 
and subsequent application of the Cauchy Schwarz inequality yields the bound for A. ■ 
If, in some asymptotic regime, the X^ are comparable in expectation in such a way that 
Pa ~ l-^l"^; if A* and grow like |^|; if b remains bounded; and if \V\ is of order |^|, then, 
in Theorem 11.21 A and A and, therefore, 6 are of order 1/cr. 

Corollary 3.2 Let Q be an index set, let {Cg,g E Q} be a collection of independent random 
elements taking values in an arbitrary set C, let {Qa, ct G A} be a finite collection of subsets 
of Q , and, for a E A, let 

Xa = Xa{Cg : g G Ga) 
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be a function of the variables {Cg,g G Qa}, taking values in [0,M]. Then Theorem \l.^ mau 
be applied to Y = Xla^o with B and A as in \4^^ , taking p a = EX^/^^EX^, 

B^ = {l3eA:Gpr\G^^^} forae A, (45) 

and any T> for which 

V D {(ai,a2) : there exists {^1,^2) e B^^ x Ba^ with Q^^ n Q/^^ ^ 0}. (46) 

Proof: Since X^ and X/3 are functions of disjoint sets of independent variables when Qol H 
= 0, PT|l holds with the dependency neighborhoods given by Now, for each a E A, 
consider the following (X, X"") coupling. Let {C^^\g G Qa} be independent of {Cg,g G Q} 
and have distribution 

dP^''\cg,g G g^) = ^;^'f''%^:\ dP{cg,g G G^). 
EXQ,(Cg,5f G t^oj 

Then, by direct verification of (001) , the collection 

X| = X^(Cg, geg^n q:;, C^^\g eg^ng^), peA 

has the X° distribution. Taking = {Cg : g e g}, we have E(X||J^) = E{Xj^\Cg,g G ^^) 
and, since E (X/3IJF) = X^, the conditional expectation E (X^ — X^|jF) is a function of 
{Cg, g G ^/g} only. In particular, if (ai, 02) ^ then, for all Pi G and /?2 £ -Bq-j we have 
^/3i n g^^ = and, consequently, E (X^^^ — X^JjF) and E (X^^^ — X^jjjF) are independent, 
yielding (|42p . and all conditions of Corollarv 13. II hold. ■ 

With the exception of Example 13. 5| in the remainder of this section we consider graphs 
g = {y,S) having random elements {Cgj^gvuf assigned to their vertices and edges, and ap- 
plications of Corollarv 13.21 to the sum Y = ^Qg_4 X^ of bounded functions Xq = Xa{Cg, g G 
Vq U So), where = (VQ,,£^a),a G ^ is a given finite family of subgraphs of g; we abuse 
notation slightly in that a graph g is replaced by V U £^ when used as an index set for the 
underlying variables Cg. When {Cg}g^g are independent, Corollarv 13.21 applies and, in (j45|) 
and pUj). the intersection of the two graphs (Vi, £1) and (V2, S2) is the graph (Vifl V2, £in£2). 

Furthermore, if ^ C V and there is a distance d{a,P) defined on A, then letting 

p = M{g : V„ n V/3 = 0for all a,p e A with d{a, P) > g}, (47) 

we may use 

= {P '■ d{a,P) < p} and V = {{ai,a2) : d{ai,a2) < 3p} (48) 

in and pUj) . respectively, since rearranging d{ai,a2) < d{ai, Pi) + d{Pi, P2) + d{P2,a2) 
gives, 

d{Pi,P2) > d{ai, 02) - {d{ai,Pi) + d{a2, P2)) > d{ai, 02) - 2p > 

for {ai,a2) ^ V and {Pi, P2) e B^^ x Ba^- 

For f G V and r > let g^^r be the restriction of g to the vertices at most a distance r 
from v; that is g^^r has vertex set Vv,r = {w E V : d{v, w) < r} and edge set Sy^r = {{w, u} G 
S : w,u E Vv,r}- We say that a graph g is distance r-regular if ^^j^^ is isomorphic to some 
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graph {Vr,Sr) for all v. For example, a graph of constant degree is distance 1-regular. This 
notion of distance r-regular is related to, but not the same as, the notion of a distance-regular 
graph as given in jB] and [Hj. For a distance r-regular graph let 

V{r) = \Vr\. (49) 

Corollary 13.31 below, follows from Corollary 13.21 as a consequence of the remarks above, and 
by noting that the given assumptions imply that \V\ = |^|V^(3p) and that EXq, is constant, 
yielding Pa = l/\A\. 

Corollary 3.3 Let Q be a graph with a finite family of isomorphic subgraphs {Qa, oi G 
^},^ C V, let d{-,-) be a distance on A, and define p as in HJ^ . For each a & A, let 
Xa be given by 

X^ = X{Cg,geg^) (50) 

for a fixed function X taking values in [0, M], and let the elements of {Cg}g(zg be independent, 
with {Cg : g G Qa} identically distributed. If Q is a distance-?, p-regular graph, then Theorem 
li.jjl may be applied to Y = "^aeA '"'^^^ V{r) as given in and 

B = V{p)M, A < M\A\-'/^V{p)^V{3p). (51) 

Natural families of examples in can be generated using the vertex set V = {1, ... , n}^ 
with componentwise addition modulo n, and d{a,P) given by e.g. the distance ||a — 

Example 3.4 {Sliding m-window.) For n > m > 1, let ^ = V = {l,...,n} considered 
modulo n, {Cg '■ g & Q} i.i.d. real valued random variables, and for each a E A 

Qa = (Vq,, £a), where Va = {v E V : a < v < a + m — 1} and £a = 0. (52) 

Then for X : [0, 1], Corollary 13.31 mav be applied to the sum Y = J2aeA-^°' 

m-dependent sequence Xa = X{Ca, ■ ■ ■ , Ca+m-i), formed by applying the function X to the 
variables in the 'm-window' Vq. In this example, taking d{a,j3) = \a — l3\ gives p = m — 1 
and V{r) = 2r + 1. Hence, from B = (2m - 1) and A < n-^/'^{2m - l)(6m - 5)^/^. 

In Example I3.5l the underlying variables are not independent, and Corollaries 13 . 21 and 13 . 31 
cannot be directly applied. 

Example 3.5 {Relatively ordered sub-sequences of a random permutation.) For > m > 1, 
let TT be a uniform random permutation of the integers V = {1, . . . , n}, taken modulo n. For 
a permutation r on {1, ... , m}, let Qa and Vq, be as specified in (j^ . and let Xa the indicator 
function requiring that the pattern r appears on Va] that is, that the values {t^{v)}^^\;^ and 
{r(t>)}t,gVi are in the same relative order. Equivalently, the pattern r appears on Va if and 
only if 7t{t^^{v) + « — 1), G Vi is an increasing sequence, and we write 

Xa{7T{v),v G Ga) = 1{7i{t-\1) + a - 1) <■■ < 7v{T-\m) + a - 1)) . 

With A = V, the sum Y = Xlag^^a counts the number of m-element-long segments of tt 
that have the same relative order as r. 
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For a E A, we generate X" = {X^, P G A} by reordering the values of n{j) for 7 G Vq,, 
to be in the same relative order as r, and let be the indicator requiring r to appear at 
position P in the reordered permutation. Letting JF = 0"{7r}, we have E (X^|JF) and Xjs 
depend only on the relative order of {vr(7) : — (m — 1) < 7 — /5 < 2(m — 1)}. Since the 
relative order of the non-overlapping segments of the values of vr are independent, (PT|l and 
fl42|) hold when Ba and V are as in PH|) . for /5) = |a — /?| and p = m — 1; hence, Theorem 
11.21 may be applied with the same value for B and bound on A as in Example 13.41 

When T = Lm, the identity permutation of length m, we say that vr has a rising sequence 
of length m at position a if Xa = 1. Rising sequences were studied in |lj in connection with 
card tricks and card shuffling. Due to the regular-self-overlap property of rising sequences, 
namely that a non-empty intersection of two rising sequences is again a rising sequence, some 
improvement on the constant in the bound can be obtained by a more careful consideration 
of the conditional variance. 

Example 3.6 {Coloring patterns and subgraph occurrences on a finite graph Q). For illus- 
tration, take V = ^ = {1, . . . considered modulo n, let d{a,[3) = \\a — P\\ with || • || 
the sup norm, let S = {{w,v} : d{w,v) = 1}, and, for each a E A, let Qa = (Vc^^a) where 

Vq, = {a + (ei, . . . , Cp) : Cj G {0, 1}} and £a = {{v, w} : v,w E Vq,, d{w, v) = 1}. 

Let C be a set (of e.g. colors) from which is formed a given pattern {cg : g E Qo}, let 
{Cg,g E Q} be independent variables in C with {Cg : g E Qa}ai^A identically distributed, 
and let 

X{Cg,gEgo)= \{l{Cg = Cg), (53) 
g&Qo 

and Xa given by (j5(Jj) . Then Y = YlaeA^°' counts the number of times the pattern appears 
in the subgraphs Qa- CoroUarv 13.31 mav be applied with M = 1, p = 1 (by (jUj)), V{r) = 
(2r + 1)P, and (by 5 = 3^ and A < {Q3/n)P/^. 

Such multi-dimensional pattern occurrences are a generalization of the well-studied case 
in which one-dimensional sequences are scanned for pattern occurrences; see, for instance, 
[TT^ and [2n] for scan and window statistics, see [21] for applications of the normal ap- 
proximation in this context to molecular sequence data, and see also and ^2], where 
higher- dimensional extensions are considered. 

Occurrences of subgraphs can be handled as a special case. For example, with (V, S) 
the graph above, let C be the random subgraph with vertex set V and random edge set 
{e E S : Ce = 1} where {Cgjeef are independent and identically distributed Bernoulli 
variables. Then say, taking the product in over edges e E Sq and setting Ce = 1, the sum 
Y = J2aeA counts the number of times that copies of Sq appear in the random graph G; 
the same bounds hold as above. 

The authors of studied the related problem of counting the number of small cliques 
that occur in the random binomial graph, a case in which the dependence is not local; the 
technique applied is the Chen-Stein method. 

Example 3.7 {Local extremes.) Let Qa, a G .4, be a collection of subgraphs of Q isomorphic 
to Qo, let f G Vo be a distinguished vertex, let {Cg,g G V} be a collection of independent 
and identically distributed random variables, and let Xa be defined by with 

X{Cfs,PE Vo) = l(a >C^,f3E Vo). 
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Then the sum Y = X^og^!! -^a counts the number of times the vertex in which corresponds 
under the isomorphism to the distinguished vertex v G Vq, is a local maxima. Corollarv 13.31 
holds with M = 1; the other quantities determining the bound begin dependent on the 
structure of Q. 

For example, consider the hypercube V = {0, 1}^ and S = {{v,w} : ||f — w|| = 1}, where 
1 1 ■ 1 1 is the Hamming distance (see also P and j2]). Take v = 0, A = V, and, for each a E A, 
let Vq, = {/? : 1 1/3 — a| I < 1} and Sa = {{v,w} : v,w E V^, ||f — w|| = 1}. Corollarv 13.31 
applies with p = 2 (by (gTD), V{r) = J2"j=o (?), and (by (jSH)) 



B = l+p + 



and A.2-|:0^ 



4 Proofs of Theorems 1.1 and 1.2 

In this section, H denotes a class of measurable functions satisfying properties (i),(ii), and 
(iii) (as described in Section [l]), and h denotes an element of H. Recall that 6 is given by 
dHl), let (pit) denote the standard normal density, and, for t G (0, 1), define 

ht{x) = h{x + ty)(p{y)dy and 6t = sup{\Eht{W) - Nht\ -.heU}. (54) 



Lemma 4.1 For a random variable W on M, we have 

5 <2.85t + 4:.7at for all t e (0,1), (55) 
where a is as in Furthermore, for all A > and h^ as in (0), 

E hA+t\y\{W)\(P'iy)\dy^ <25 + a{A + t). (56) 

Proof: Inequality ()55p is Lemma 4.1 of [23], following Lemma 2.11 of ^H], which stems from 
As in 123, adding and subtracting to the left hand side of dSHl) we have 



E {^j {hA+t\y\m-hA+t\y\{Z))\<P'iy)\dy + j hA+t\y\{Z)\<p'{y)\dy 

< J \EhA+t\y\m-'^hA+t\y\{Z)\ \(P'{y)\dy + J EhA+t\y\iZ)\(P'{y)\dy (57) 

< (26+1 a{A + t\y\)\(P'iy)\dy] <26 + a{A + t), 



where for the first term inside the parentheses in we have used the facts that h'^^^^y\ ^ 
and J \(f)'{y)\dy < 1. For the second term in the parentheses, we have used ((Tj) and the fact 
that J\y\\(l)'iy)\dy = l. m 
In Sections 14.11 and 14. 2| ht is given by (jK^ and / is the bounded solution of the Stein 
equation Q with p = 0, = 1, and test function ht. With || ■ || the sup norm. Lemma 3 of 
gives 



< V27r < 2.6 and ||/'||<4. (5^ 
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4.1 Proof of Theorem 1.1 (zero biasing) 

Lemma 4.2 Let Y be a mean-zero random variable with variance a"^ , and let Y* be defined 
on the same space as Y , with the Y-zero biased distribution, satisfying \Y* ~Y\/a < A for 
some A. Then 

5t < (6.6 + a)A + + ^ {25 A + aA^) for all t e (0, 1). 

Proof: Let W = Y/a, whence W* = Y*/a and \W* - W\ < A. By differentiation in (0) 
and respectively, we have 



r (x) = fix) + xf\x) + K{x), with h[{x) = --j h{x + ty)ct>\y)dy. (59) 
By ^ and with Nth = Wjht{Z) for Z a standard normal variable, we also have 



\Eht{W)-Nht 



\E[f{W*)-f'{W)]\ 



w* 



f"{x)dx\ 



w 



IE 



w* 



{f{x)+xf'{x) + h[{x))dx\. 



(60) 



w 



Let V = W* — W. Applying the triangle inequality in (ISUj) and using (|3Sj). for the first term 
we find that 



/ f{x)dx\ <2.6'E\V\ <2.6A 
Jw 



(61) 



and for the second term, again using (j58|) . and, now, E|14^| < (EW'^Y^'^ = 1, we find that 



w* 

E / xf\x)dx 
w 





rW+V 


< 4E 


I \x\dx 




Jw 



2E \{W + V)\W + V\-W\W\ 



< E (4| W| + 2V^) < AAWj\W\ + 2A^ < AA + 2A^. 
For the final term in (j6(J|) . with U ~ W[0, 1] independent of W and V, we write 

|E / h[{x)dx\ = \EV h[{W + uV)du\ = \EVh[{W + UV)\. 
Jw Jo 

Then, using J (f)'{y)dy = 0, and Lemma HUl we have 

\EVh't{W + UV)\ = j\EV j h{W + UV + ty)(t)'{y)dy\ 
= \\^y j + UV + ty) - h{W + UV)](/)'{y)dy\ 

< -E (\v\ j \h^y\^t\y\(W) - \^\y)\dy) hA+t\y\{w)W{y)\dy 



(62) 



< -A{25 + a{A + t)) = -{25A + aA^)+aA. 



(63) 
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By combining the bounds ()61|) . ()62j) . and (jU^ we complete the proof. ■ 
Proof of Theorem II. H Letting t = a A in Lemma f4. 21 we have 

5t < {6.6 + a) A + 2A'^ + ^ (26 A + aA"^) ={Q.Q + a+-)A + 2A^ + —. (64) 

aA a a 

Substituting (jMjl into the bound for 6 given by Lemma Wl] we have 

S < 2.8{{6.6 + a + -)A + 2A'^ + —)+4:.7aaA 

a a 

< 18.5A + 2.8aA + 2.8— + 5.6^^ + 5.6- + A.YaaA, 

a a 



meaning that 



^ ^ ^ .18.5 + 5.6A + 2.8a + 2.8a/ a + A.laa , 
~ 1 — 5.6/a 



Setting a = 2 X 5.6, for which t < 1 since A < 1/12, we obtain (jH)) and, hence, the theorem. 



4.2 Proof of Theorem 1.2 (size biasing) 

Lemma 4.3 Let Y > {] he a random variable with mean fi and variance a"^, and let y be 
defined on the same space as Y , with the Y -size-biased distribution, satisfying \Y'^—Y\/a < A 
for some A. Then for all t G (0, 1), 

5t<^{— + (3.3 + ]-a)A' + Ia^ + 1(2M^ + aA^)] , (66) 

with A as in / f73|) . 

Proof: With W = (Y — fi)/a, let = (Y^ — fi)/a (which is a slight abuse of notation). 
Then, \W' - W\ < A. Note that 

^WfiW) = ^ifiW')-fiW)), (67) 
a 

and, so, with V = — W, we have 

EhtiW) -Nht = E ifiW) - Wf{W)) = E (/' (W) - f^ifiW) - /(W^)) 

= E (^f'{W) f'ix)dx^ = E (^f'iW) - f'{W + uV)dv^ 

= E(^f'{W)-^Vf\W)^+-E(^^Vf\W)-^V f\W + uV)duy (68) 

Since E{fiV/(T) = fiE{W' -W)/a = E{fiY' - fiY)/a^ = 1, for the first expectation in §^ 
we have 

E |/'(W^)E (^1 - < 4^v^Var(E(Vr^ - W\W)) = (69) 
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using and (fT^ . Now, using we write the second expectation in (jUHj) as 



pi pW+uV pi nW+uV 

= -^V / f"{v)dvdu = -^V / (/(t;)+t;/'(t;) + /i;(t;))rft;c/M. (70) 

We apply the triangle inequality and bound the three resulting terms separately. For the 
expectation arising from the first term on the right-hand side of (jTOJ, by ()58p we have 



\^{-V t r^^^ f{v)dvdu}\<2.6^E{\V\ [\\V\du} < 1.3^ A" 
Jo Jw Jo o- 



(71) 



and, for the second term, arguing as in ()fi2p we have 

-1 rW+uV ,, /.I 



W+uV 

2\v\dv 

w 



du 



\E^V [ [ ^ vf{v)dvdu\<2^E\V\ [ 
Jo Jw Jq 
pi pi 

< 2^E|V| / {2u\WV\+u'^V^)du<2^A / {2AuW,\W\ + u'^A^)du 

Jo Jo 

< 2^A{A + Ay3). (72) 

a 

For the last term in (jTOJ, the computation is more involved than, yet similar to, that for 
zero biasing. Beginning with the inner integral, we have 

-W+uV pi 

h[{v)dv = uV I h[{W + xuV)dx 



'w Jo 
and using 

J <P'{y)dy = 0, 
and Lemma f4.H for the last term in ()7Up we have 



-E [ [ uV^h[{W + xuV)dxdu\ 
Jo Jo 

-\EV^ 1 1 I uh{W + xuV + ty)(j)'{y)dydxdu\ 
Jo Jo J 

EV^ [ [ [ u[h{W + xuV + ty) - h{W + xuV)](t)' {y)dydxdu\ 
Jo Jo J 

(v'l ^ 1^,1^1 (W)-/ip^|^,|^|(W)]|</.'(y) I W 
- £1"^'^ (/ hA^tly\iW)\<P'iy)\dy^ 



at 

< 



< J^A^26 + a{A + t)) 
2at 

= ±-mA' + aA') + -^aA\ (73) 
2at 2a 
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By combining (jU^ . (f7T|) . (f72j) . and (fTHj) we complete the proof. 

Proof of Theorem 11.21 Applying Lemma f4. II using the bound (jMj) on 5^, we have 

S < 2.8^ (— + (3.3 + Ja)A2 + ^A' + 7^(2^^ + aA')] + 4.7at, 
cr \ cr 2 6 Zt ) 



or, 



2.8(/i/a) (4A/(T + (3.3 + ia)y42 + + av4V2t) + 4.7at 
- l-2.8MV(at) ■ 

Setting t = 2 X 2.8/iA7f^, such that t < 1 since A < ((T/(6/i))^/^ ^ now follows from 
S < 5.6- — + (3.3 + -a)^^ + + — — — + 2(4.7)a(5.6- ^ 



a \ a ' ■ 2 ' 3 2(5.6/i) y v • / v • ^ / 
< ^ + /f((i9 + 56a)A2 + 4A3)+23^. ■ 

2 cr ^ cr"' 

There are compromises in the choice of smoothing parameter; if we take a = 4 x 5.6 in 
dnSl) for 5 < a/48, and t = 4 x 2.8/iAVcr in (HH) for S < ^3/2/(12^)1/2, bounds Q and (fT^ 
become 

6 < A(145a + 7.5A + 25) (75) 

and 

S < ^ + ((i3 + 73a)A^ + 2.5A=^)+15^, (76) 
6 a a^ 

respectively. 



5 Remarks 

The zero- and size-bias coupling both conform well to Stein's characterizing equation, and 
their use produces bounds on the distance of a random variable Y to the normal in many 
instances. The couplings are adaptable to the situation; in particular, the size-biased cou- 
pling, previously used in [T7] for global dependence, is applied here to handle cases of local 
dependence. 

The applications in Section |21 illustrate how bounds on the distance 6 from Y to the 
normal can be generated using only a zero-bias coupling and a bound on |y* — F |; in partic- 
ular, the bounds do not depend on the often-difficult calculation of variances of conditional 
expectations of the form Var{E {Y — y|y)}, which appear in the exchangeable-pair and 
size-biased versions of Stein's method when coupling Y to some Y. It is hoped that this 
feature of the zero-bias method will motivate a better understanding of the construction of 
couplings of Y* to Y in greater generality than those that depend on the existence of the 
exchangeable pair of Proposition 12.11 In particular, the applications in Section El show an 
evidently wider scope of applicability of the size bias coupling over the zero bias one, as it 
is presently understood. 
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